Abstract
This project analyzes the behavior of the Euro against the dollar (EUR/USD) during the release of a very important indicator for the economies and that reflects their current health, such as the Gross Domestic Product (GDP).
Algorithmic trading is a method of executing orders using automated pre-programmed trading instructions accounting for variables such as time, price, and volume. This type of trading attempts to leverage the speed and computational resources of computers relative to human traders. In the twenty-first century, algorithmic trading has been gaining traction with both retail and institutional traders.It is widely used by investment banks, pension funds, mutual funds, and hedge funds that may need to spread out the execution of a larger order or perform trades too fast for human traders to react to. A study in 2019 showed that around 92% of trading in the Forex market was performed by trading algorithms rather than humans. (Lemke and Lins, 2015)
Examples of strategies used in algorithmic trading include market making, inter-market spreading, arbitrage, or pure speculation such as trend following. Many fall into the category of high-frequency trading (HFT), which is characterized by high turnover and high order-to-trade ratios. HFT strategies utilize computers that make elaborate decisions to initiate orders based on information that is received electronically, before human traders are capable of processing the information they observe. As a result, in February 2012, the Commodity Futures Trading Commission (CFTC) formed a special working group that included academics and industry experts to advise the CFTC on how best to define HFT.Algorithmic trading and HFT have resulted in a dramatic change of the market microstructure and in the complexity and uncertainty of the market macrodynamic particularly in the way liquidity is provided. (Alabama Law Review, 2009)
High-frequency trading (HFT) is a type of algorithmic financial trading characterized by high speeds, high turnover rates, and high order-to-trade ratios that leverages high-frequency financial data and electronic trading tools. While there is no single definition of HFT, among its key attributes are highly sophisticated algorithms, co-location, and very short-term investment horizons. HFT can be viewed as a primary form of algorithmic trading in finance. Specifically, it is the use of sophisticated technological tools and computer algorithms to rapidly trade securities. HFT uses proprietary trading strategies carried out by computers to move in and out of positions in seconds or fractions of a second. (Aldridge, 2013)
In order to run this notebook, it is necessary to have installed and/or have the requirements.txt file with the following:
The following are the file dependencies that are needed to run this notebook:
import data as dt #Big Chungus
import functions as fn
Economic indicator: Gross Domestic Product (GDP)
Volatility or impact: High
Economy: European
Active to explore: EUR/USD
This project analyzes the behavior of the Euro against the dollar (EUR/USD) during the release of a very important indicator for the economies and that reflects their current health, such as the Gross Domestic Product (GDP).
This indicator was chosen to understand the relevance and impact it has on the EUR/USD parity nowadays, where a recession is close, there are great inflationary pressures, there are also energy crises, all this is reflected in the GDP of the country or in this case the European economy as a whole. This indicator is published and validated by the European Commission's statistics office Eurostat and is an estimate of the total value of goods and services produced in the Eurozone. GDP is considered a measure of global economic activity and indicates the rate of growth of a country's economy. A reading higher than expectations is bullish for the euro, while a lower reading is bearish, also keep in mind that the frequency of GDP releases is quarterly, thus leaving a parameter that two consecutive quarters in lower or negative readings mean a technical recession.
It is important to mention that this indicator has preliminary publications that are made prior to the final publication, in other words, a preliminary publication of the GDP for a given quarter is made one month before the final publication, and a second publication is made two weeks after that. preliminary with its due adjustments for the GDP of the same quarter, and finally, approximately two weeks later, the final and official publication of the behavior of the GDP of the country or economic zone is made.
To visually understand validations, it is necessary to take into account the following:
#Visualization
dt.data_PIB.head()
| Caracteristica | Fecha | Promedio móvil | Periodo | FECHA | ACTUAL | Consenso | Previo | |
|---|---|---|---|---|---|---|---|---|
| 0 | P1 | 30-abr-20 | NaN | Q1-20 | 1 | -0.038 | -3.50% | 0.10% |
| 1 | P2 | 15-may-20 | -0.037333 | Q1-20 | 2 | -0.038 | -3.80% | -3.80% |
| 2 | Oficial | 09-jun-20 | -0.065000 | Q1-20 | 3 | -0.036 | -3.80% | -0.10% |
| 3 | P1 | 31-jul-20 | -0.092667 | Q2-20 | 4 | -0.121 | -12% | -3.60% |
| 4 | P2 | 14-ago-20 | -0.120000 | Q2-20 | 5 | -0.121 | -12.10% | -12.10% |
We will take preliminary announcements as blue lines and official announcements as yellow lines.

Finding patterns, we can see that both, the day of the preliminary announcement 1, the preliminary 2 and the official announcement, the candles of those days have the same trend, whether bearish or bullish, we can observe it in the following validations:

%2017.40.35.png)
To a greater or lesser extent, the pattern is fulfilled and can be used for trading strategies on the EUR/USD, with the clear event of the percentage change or numerical value that is in said publications and thus be able to have more certainty about the strength of the pattern. movement.
In the case of 2021 before all the previous ones, there was a discrepancy, the pattern did not happen, which can indicate a refinement to our pattern and focus on the last candle:
In short, our pattern could be as if the candle of the first preliminary announcement is and the second preliminary announcement have the same behavior, the third candle, which would be the day of the official announcement, will have the same behavior.
This section is using only the current data of the economic indicator for the past two years.
Economic Indicator : Euro area Gross Domestic Product (GDP)
This indicator translates into the monetary value of goods -from food products, vehicles, machinery or textiles- and services -such as health, education, etc.- produced at the national level during a certain period of time. Thanks to the GDP, we can evaluate the evolution of private consumption, investments, public spending and the trade balance. In this way, if we have a country with a positive GDP, we can interpret that it is experiencing a situation of economic growth, which stimulates investment, job creation, etc.
Data: We focused on the Euro area GDP , using quarterly data from the last two years. Only the "current" value time series was considered.
Visualization of the original data:
dt.data_PIB.head(5)
| Caracteristica | Fecha | Promedio móvil | Periodo | FECHA | ACTUAL | Consenso | Previo | |
|---|---|---|---|---|---|---|---|---|
| 0 | P1 | 30-abr-20 | NaN | Q1-20 | 1 | -0.038 | -3.50% | 0.10% |
| 1 | P2 | 15-may-20 | -0.037333 | Q1-20 | 2 | -0.038 | -3.80% | -3.80% |
| 2 | Oficial | 09-jun-20 | -0.065000 | Q1-20 | 3 | -0.036 | -3.80% | -0.10% |
| 3 | P1 | 31-jul-20 | -0.092667 | Q2-20 | 4 | -0.121 | -12% | -3.60% |
| 4 | P2 | 14-ago-20 | -0.120000 | Q2-20 | 5 | -0.121 | -12.10% | -12.10% |
dt.data_PIB.tail(5)
| Caracteristica | Fecha | Promedio móvil | Periodo | FECHA | ACTUAL | Consenso | Previo | |
|---|---|---|---|---|---|---|---|---|
| 22 | P2 | 15-feb-22 | 0.003000 | Q4-21 | 23 | 0.003 | 0.30% | 0.30% |
| 23 | Oficial | 08-mar-22 | 0.001333 | Q4-21 | 24 | 0.003 | 0.30% | 0.30% |
| 24 | P1 | 29-abr-22 | 0.001333 | Q1-22 | 25 | -0.002 | 0.30% | 0.30% |
| 25 | P2 | 17-may-22 | 0.002333 | Q1-22 | 26 | 0.003 | 0.20% | 0.20% |
| 26 | Oficial | 08-jun-22 | NaN | Q1-22 | 27 | 0.006 | 0.30% | 0.30% |
# visualización datos originales
fn.visualizacion(dt.data_PIB)
We can see that the data are measured with the same frequency (quarterly) and have an upward trend. There are fluctuations so we decided to smooth the time series with a moving average in order to make a better prediction.
Moving Average:
We executed a 3-period moving average.
# visualización con promedio móvil
fn.visualizacion2(dt.data_PIB)
# comparación con y sin promedio móvil
fn.visualizacion3(dt.data_PIB)
It is perceptible that the abrupt changes decreased a little thanks to the moving average with a 3-period window, by which a better fit for the data can be found.
A time series is a sequence of observations, generally measured at equally spaced intervals, in this case we use data from the last few years of Euro area GDP.
Descriptive classification and stationarity:
In the time series there are two deterministic components: trend component and seasonal components, while there is also a random component that doesn't correspond to any behavior pattern, but is the result of random factors that have an isolated effect. A time series is denoted:
$$X_t=T_t+E_t+I_t$$However, they can be classified as both stationary and non-stationary according to their behavior. The Eurozone GDP, being an economic indicator, is not expected to be a series with stationarity because the mean varies during the period, it does not have a constant variance, in the graphs it can be seen that the series does not oscillate around a constant mean, therefore, our series is non-stationary.
For the same reasons, it does not behave like white noise since it does not have a zero mean and constant variance.
Seasonality:
Seasonality is a behavior or pattern that we sometimes observe in a time series. It consists of periodic ups and downs that occur regularly in the time series. In the period that we study of the Euro area GDP for this analysis, none is detected.
Autocorrelation:
Correlation in time series is also known as autocorrelation or serial correlation, it is a statistical method that allows quantifying the linear relationship between two variables. It can be measured through the autocorrelation function and the partial autocorrelation function. Autocorrelation function:
$$P_j=corr(X_j,X_{j-k})=\frac{cov(X_j,X_{j-k})}{\sqrt{V(X_j)},\sqrt{V(X_{j-K})}}$$Properties:
$P_0=1$
$-1\le P_j\le1$
$Simetría Pj = P -j $
Partial autocorrelation measures the correlation that exists between two variables separated by k periods when the dependency created by the intermediate lags between them is not considered.
$$\pi_j=\ corr(X_j,X_{j-k}/X_{j-1}X_{j-2}...X_{j-k+1})$$$$\pi_j=\frac{cov(X_j-{\hat{X}}_j,X_{j-K}-{\hat{X}}_{j-K})}{\sqrt{V(X_j-{\hat{X}}_j)}\sqrt{V(X_{j-k}-{\hat{X}}_{j-k})}}$$#coeficiente de correlación de Pearson
fn.correlacion
| FECHA | Promedio móvil | |
|---|---|---|
| FECHA | 1.000000 | 0.290555 |
| Promedio móvil | 0.290555 | 1.000000 |
The correlation value of our data series obtained is 0.290555 which, being positive, means that the values tend to increase together.
This value was obtained with the Pearson correlation function in Python, however, we also calculates the Spearman correlation which result was:
fn.correlacion_spearman
SpearmanrResult(correlation=0.31204311659537626, pvalue=0.12887653216320152)
For the partial correlation we added a new column to the dataframe that contains the value of its previous moving average, so we can obtain the relationship of the moving average with respect to the period while controlling this third variable. Using the pingouin function partial_corr in python we get the following:
fn.correlacion_parcial
| n | r | CI95% | p-val | |
|---|---|---|---|---|
| pearson | 24 | 0.030996 | [-0.39, 0.44] | 0.888348 |
The partial correlation is 0.030996 and in the following table the correlations between each variable were obtained:
fn.correlacion_parcial2
| FECHA | Promedio móvil | Shift | |
|---|---|---|---|
| FECHA | 1.000 | 0.031 | 0.186 |
| Promedio móvil | 0.031 | 1.000 | 0.736 |
| Shift | 0.186 | 0.736 | 1.000 |
Homoscedasticity and heteroscedasticity:
Heteroscedasticity occurs when the variance of the errors is not constant throughout a sample. It is the opposite of homoscedasticity. We carried out different tests to find out if there is homoscedasticity or heteroscedasticity, from which the following results were obtained,
Taking as a hypothesis...
$H_0 = Homocedasticity$
$H_1 = Heteroscedasticity$
Fligner test:
fn.fligner_test
FlignerResult(statistic=28.375563580506306, pvalue=9.991861279429262e-08)
Levene test:
fn.levene_test
LeveneResult(statistic=70.81092169626352, pvalue=5.235575011761217e-11)
Bartlett test:
fn.bartlett_test
BartlettResult(statistic=202.60336712204736, pvalue=5.645947197265984e-46)
With the three tests it is observed that the p-value is <0.05, therefore there is evidence to reject the hypothesis that they have the same variance, therefore there is heteroscedasticity.
Normal:
Quantile-Quantile graph to corroborate normality
data3=dt.data_PIB.drop([0,26])
fn.visualizacion4(data3)
The graph is not very clear, so we probed whether or not there is normality with the Shapiro-Wilk test. The Shapiro-Wilk test posits the null hypothesis that a sample comes from a normal distribution. We choose a level of significance, usually 0.05, and have an alternative hypothesis that the distribution is not normal.
$H_0:$ distribution is normal
$H_1:$ distribution is not normal
fn.shapiro_test
ShapiroResult(statistic=0.9160053133964539, pvalue=0.0415816493332386)
The Shapiro-Wilk test attempts to reject the null hypothesis at our level of significance. We perform the test in python through the Shapiro function of scipy.stats, obtaining as a result a Shapiro coefficient of 0.9160 and a p value of 0.0415816493332386 The P value is not greater than 0.05, therefore we cannot accept the hypothesis that the distribution is normal.
The Central Limit Theorem shows that the distribution of the mean of the data from any distribution approaches the normal distribution as the sample size increases. Therefore, if one wants to make inferences about data without a normal distribution, the assumption of normality is not essential as long as the sample is large enough.
It is a statistical technique used to predict one quantitative variable as a function of another. The linear model is given by the following equation:
$$y\ =\ b_0+b_1x_1+b_2x_2+...+b_nx_n+u$$Where $b_1$, $b_2$, ... bn are the coefficients or parameters that denote the magnitude of the effect that the independent variables $x_1$, $x_2$, ... $x_n$ have on the independent variable $y$.
The coefficient $b_0$ is the constant term o o independent of the model and $u$ is the term that represents the model error.
fn.polinomio(data3)
coefficient of determination: 0.9035562928021313 intercept: -1.8070397662708915 coefficients: [ 2.46076781e+00 -1.30685021e+00 3.46452180e-01 -5.17272019e-02 4.63347222e-03 -2.54363804e-04 8.38471535e-06 -1.52398225e-07 1.17423427e-09]
We performed a linear regression with the smoothed data and we were trying different models, the one that best fit was a polynomial model which is a type of linear regression of degree 9 with a coefficient of determination of 0.9035
Polynomial:
$$y=\ -1.807743-2.4610e^{+00}X-1.30688e^{+00}X^2+3.4645e^{-01}X^3-5.1727e^{-02}X^4+4.6334e^{-03}x^5-2.5436e^{-04}x^6+8.3847e-06x7-1.52398e-07x8+1.17e-09x9$$When caracterizing the scenarios of ocurrence within the context of fundamental analysis, we refer to the status of the macroeconomic indicator we are using as reference.
As such, we can caracterize four ossible scenarios when performing fundamental analysis using the Gross Domestic Product results:
| Scenario | Rule |
|---|---|
| A | Actual >= Consensus >= Previous |
| B | Actual >= Consensus < Previous |
| C | Actual < Consensus >= Previous |
| D | Actual < Consensus < Previous |
# Importin' Data
import numpy as np
import functions as ft
import pandas as pd
gdp_df = dt.dfA
gdp_df = gdp_df.sort_index()
escenarios = ft.escenarios_ocurrencia(gdp_df)
escenarios.head()
| Caracteristica | Periodo | Fecha | Actual | Consenso | Previo | Escenario | |
|---|---|---|---|---|---|---|---|
| timestamp | |||||||
| 2020-04-30 09:00:00 | P1 | Q1-20 | 1 | -0.038 | -0.035 | 0.001 | D |
| 2020-05-15 09:00:00 | P2 | Q1-20 | 2 | -0.038 | -0.038 | -0.038 | A |
| 2020-07-31 09:00:00 | P1 | Q2-20 | 4 | -0.121 | -0.120 | -0.036 | D |
| 2020-08-09 09:00:00 | Oficial | Q2-20 | 6 | -0.118 | -0.121 | -0.035 | B |
| 2020-08-12 09:00:00 | Oficial | Q3-20 | 9 | 0.125 | 0.126 | -0.116 | C |
Now, we need to observe the behaviour of the currency based on the previous scenario ocurrence, 30 minutes before the announcement, and 30 minutes after the announcement.
Summarizing the results into metrics we can digest:
currency_df = dt.dfB
pares = ft.pares(gdp_df)
dic = ft.trading_history(gdp_df,currency_df
,pares)
y = ft.pip_Metrics(escenarios,dic)
y.head()
| Escenario | Direction | Pips Alcistas | Pips Bajistas | Volatilidad | |
|---|---|---|---|---|---|
| timestamp | |||||
| 2020-04-30 09:00:00 | D | 1 | 135.0 | -115.0 | 181.0 |
| 2020-05-15 09:00:00 | A | -1 | 22.0 | -6.0 | 117.0 |
| 2020-07-31 09:00:00 | D | -1 | -80.0 | 24.0 | 311.0 |
| 2020-08-09 09:00:00 | B | -1 | -17.0 | -12.0 | 154.0 |
| 2020-08-12 09:00:00 | C | -1 | 67.0 | -35.0 | 178.0 |
Backtesting: Standard procedure when testing an investing or trading strategy. It consists on testing whether your stragey held up against the historic data of the asset or portfolio you are looking to invest into.
In this case, the strategy was tested using an 80/20 rule, first testing it on the 80% of the historical data we had, then validating it vs. the remaining data.
The Strategy to be backtested was the following
df_decisiones = pd.DataFrame({
"Escenario" : ["A","B","C","D"],
"Operación" : ["Compra","Venta","Compra","Venta"],
"Stop Loss (Pips)" : [50,50,50,50],
"Take Profit (pips)" : [150,150,150,150],
"Volumen" : [100,50,50,100]
})
df_decisiones
| Escenario | Operación | Stop Loss (Pips) | Take Profit (pips) | Volumen | |
|---|---|---|---|---|---|
| 0 | A | Compra | 50 | 150 | 100 |
| 1 | B | Venta | 50 | 150 | 50 |
| 2 | C | Compra | 50 | 150 | 50 |
| 3 | D | Venta | 50 | 150 | 100 |
backtest_df = ft.backtest(escenarios,y,50,10)
test,val = ft.segmentar(backtest_df)
perf_test = ft.performance(test)
perf_val = ft.performance(val)
perf_test
| Ratio Éxtio (%) | Retorno de Capital (%) | Promedio Ganancias (%) | Volatilidad Ganancias (%) | |
|---|---|---|---|---|
| 0 | 66.666667 | 76.58 | 3.646667 | 6.585885 |
perf_val
| Ratio Éxtio (%) | Retorno de Capital (%) | Promedio Ganancias (%) | Volatilidad Ganancias (%) | |
|---|---|---|---|---|
| 0 | 100.0 | 21.053539 | 5.263385 | 1.131783 |
As we can see, the results of the strategy were very promising
import visualizations as vis
import plotly.io as pio
test_line = vis.backtest_evolution_chart(test.sort_index(),"Strategy Test")
test_bar = vis.backtest_strat_result(test.sort_index(),"Strategy Test")
test_line.show(renderer = "notebook")
test_bar.show(renderer = "notebook")
As we can see in the results, this strategy is profitable. Although at the beginning of the implementation of the strategy there is a loss of capital, later it is more than recovered. This loss is attributed to the existence of positive entry signals, the characteristics are ultimately destroyed by the financial crisis caused in the markets by the coronavirus. Also, due to the greater existence of buy entry signals, most of the operations, and therefore the majority of successful ones, are carried out in this way, although there are also successful short sales.
import visualizations as vis
import plotly.io as pio
val_line = vis.backtest_evolution_chart(val.sort_index(),"Strategy Validation")
val_bar = vis.backtest_strat_result(val.sort_index(),"Strategy Validation")
val_line.show(renderer = "notebook")
val_bar.show(renderer = "notebook")
Here in the validation, only 4 operations are carried out, due to the dispersion of operations among them, but all of them are successfully predicted.
This project tested our abilities and knowledge not only assignment-wise, but career-wise. We had several challenges when elaborating the trading strategy, such as finding an adequate amount to open the strategy, and also finding the pattern necessary to create a strategy. Computing related challenges mainly were data-related, because historic intraday trading data is NOT easy to obtain, also is very heavy, both in size and in processing time. Attempting to run this notebook or main.py will be the main evidence as to it.
We hope we can implement a similar work into the future, both for financial gain or for strategy benchmarking, in our future endeavours.
Aldridge, Irene (2013), High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems, 2nd edition, Wiley, ISBN 978-1-118-34350-0
Lemke and Lins, "Soft Dollars and Other Trading Activities," § 2:30 (Thomson West, 2015–2016 ed.).
The New Financial Industry, Alabama Law Review, available at: https://ssrn.com/abstract=2417988
Análisis de homocedasticidad y heterocedasticidad con python by Joaquín Amat Rodrigo, available under a Attribution 4.0 International (CC BY 4.0) at https://www.cienciadedatos.net/documentos/pystats07-test-homocedasticidad-heterocedasticidad-python.html
Santander. (2022) ¿Qué es el PIB y por qué es importante en la economía?. Recuperado el 07/07/2022 de: https://www.santander.com
Rodriguez N. (2011). Series de tiempo regresión. Recuperado el 07/07/2022 de: https://es.slideshare.net/Norlan0987/series-de-tiempo-regresinHernández
S. (2015). Análisis de series de tiempo. Recuperado el 07/07/2022 de: https://www.cepal.org/sites/default/files/courses/files/01_1_conociendo_una_serie_de_tiempo.pdf
[1] Munnoz, 2020. Python project template. https://github.com/iffranciscome/python-project. (2021).